A Scalable Data Analytics Algorithm for Mining Frequent Patterns from Uncertain Data

نویسندگان

  • Richard Kyle MacKinnon
  • Carson Kai-Sang Leung
  • Syed Khairuzzaman Tanbeer
چکیده

With advances in technology, massive amounts of valuable data can be collected and transmitted at high velocity in various scientific, biomedical or engineering applications. Hence, scalable data analytics tools are in demand for analyzing these data. For example, scalable tools for association analysis help reveal frequently occurring patterns and their relationships, which in turn lead to intelligent decisions. While a majority of existing frequent pattern mining algorithms—including FPgrowth—handle only precise data, there are situations in which data are uncertain. In recent years, researchers have paid attention to frequent pattern mining from uncertain data. UF-growth and UFP-growth are examples of tree-based algorithms for mining uncertain data. However, their corresponding tree structures can be large. Other tree structures for handling uncertain data may achieve compactness at the expense of loose upper bounds on expected supports. To solve this problem, we propose (i) a compact tree structure that captures uncertain data with tighter upper bounds than aforementioned tree structures and (ii) a scalable data analytics algorithm that mines frequent patterns from our tree structure. Experimental results show the tightness of bounds to expected supports provided by our algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows

Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...

متن کامل

Review of Algorithm for Mining Frequent Patterns from Uncertain Data

Mining frequent patterns from traditional database is an important research topic in data mining and researchers achieved tremendous progress in this field. However, with high volumes of uncertain data generated in distributed environments in many of biological, medical and life science application in the past ten years, researchers have proposed different solutions in extending the conventiona...

متن کامل

Data sanitization in association rule mining based on impact factor

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

متن کامل

A Survey Paper on Frequent Pattern Mining for Uncertain Database

There are number of existing algorithms proposed that mines frequent patterns from certain or precise data. But know a day’s demand of uncertain data mining is increased. There are many situations in which data are uncertain. For frequent pattern mining from uncertain data mainly two approaches are proposed that are level-wise approach and pattern-growth approach. Level-wise approach use the ge...

متن کامل

Vertical Mining of Frequent Patterns from Uncertain Data

Efficient algorithms have been developed for mining frequent patterns in traditional data where the content of each transaction is definitely known. There are many applications that deal with real data sets where the contents of the transactions are uncertain. Limited research work has been dedicated for mining frequent patterns from uncertain data. This is done by extending the state of art ho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014